This is the final project in the IBM Data Science course in Coursera. In this project the student should realize an own idea with geo location information.
This notebook should be searched the best neighborhood of Toronto to live as a bicyclist. For this reason it will be used the data from the last week to explore the location in Toronto. To get bicycle related data it should be used Openstreetmap and Foursquare to scrap data.
The first step is to determine which information can be used:
With this data I will cluster the neighborhoods in three clusters. The three clusters are a rating for:
The data should be got from Open Street Map (OSM) and Foursquare. OSM is an open source project and there are several APIs to get information. Foursquare is a commercial enterprise which delivered location based data. It will be used the free and limited developer access.
In the first step I try to get the outer bounds of the boroughs of Toronto. I will do this with Open Street Map because it is free and there is a well documented API. At first I will install some software to download data from OSM. These are the OSMPythonTools:
!pip3 install OSMPythonTools
Here are the libraries which will be used in the notebook:
import pandas as pd
import numpy as np
import json
import copy
from OSMPythonTools.overpass import Overpass
ovp = Overpass()
from OSMPythonTools.api import Api
api = Api()
import folium # map rendering library
import requests # library to handle requests
# import k-means from clustering stage
from sklearn.cluster import KMeans
CLIENT_ID = 'T2ZKTOONYGDDUG1R1VZOYG0T1CKLK3F5Q1DF1W0OPX1Q4EIP' # your Foursquare ID
CLIENT_SECRET = 'V5APV4N12QPNOEE1JTSIRGYV0DCOMOTR5F4KTPOTCVM4TLQ5' # your Foursquare Secret
VERSION = '20201113'
LIMIT = 500
To show the bounderies of the boroughs correct, it is needed to sort them after download.
def SortNode(ways):
index = 0
sortidx = 0
while sortidx < (len(ways) - 1):
ways_idx = ways[sortidx + 1].copy()
ways_idx_rev = ways[sortidx + 1].copy()
ways_idx_rev.reverse()
#print(f'sortidx: {sortidx}\r\n')
#print(f'ways[{sortidx}]: {ways[sortidx]}\r\nways[{sortidx + 1}]: {ways[sortidx + 1]}\r\n')
#print(f'ways_idx: {ways_idx}\r\nways_idx_rev: {ways_idx_rev}\r\n')
#print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx[0]: {ways_idx[0]}\r\n')
#print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx_rev[0]: {ways_idx_rev[0]}\r\n')
if ways[sortidx][-1] == ways_idx[0]:
sortidx = sortidx + 1
elif ways[sortidx][-1] == ways_idx_rev[0]:
ways[sortidx + 1] = ways_idx_rev
sortidx = sortidx + 1
else:
index = sortidx + 2
if index >= len(ways):
index = 0
while index != 1:
ways_idx = ways[index].copy()
ways_idx_rev = ways[index].copy()
ways_idx_rev.reverse()
#print(f'index: {index}\r\n')
#print(f'ways[{sortidx}]: {ways[sortidx]}\r\nways[{index}]: {ways[index]}\r\n')
#print(f'ways_idx: {ways_idx}\r\n')
#print(f'ways_idx_rev: {ways_idx_rev}\r\n')
#print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx[0]: {ways_idx[0]}\r\n')
#print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx_rev[0]: {ways_idx_rev[0]}\r\n')
if ways[sortidx][-1] == ways_idx[0]:
ways.pop(index);
ways.insert(sortidx + 1, ways_idx)
sortidx = 0
index = 1
break
elif ways[sortidx][-1] == ways_idx_rev[0]:
ways.pop(index);
ways.insert(sortidx + 1, ways_idx_rev)
sortidx = 0
index = 1
break
else:
index = index + 1
if index >= len(ways):
index = 0
sortidx = sortidx + 1
#print(f'ways[0]: {ways[0]}\r\nways[1]: {ways[1]}\r\n')
#print(f'ways after: {ways}\r\n')
return ways
def OverpassQuery(query):
result_w = ovp.query(query, timeout=100)
res_json = result_w.toJSON()
res_elements = res_json['elements']
#print(f'res_json:\r\n{res_json}\r\n')
#print(f'type(res_json):\r\n{type(res_json)}\r\n')
#print(f'res_json.keys:\r\n{res_json.keys()}\r\n')
return res_elements
For the OSM API I have to create queries for the different borughs, These are string concatenations with area name of OSM and keywords or tags to filter the return objects. From OSM you get nodes, ways or relations. To describe the bounderies OSM used ways which exist of many nodes. I downloaded the nodes for every borough, sorted the nodes and show it in a Folium map.
borough = ["Scarborough", "North York", "Old Toronto", "Etobicoke", "York", "East York"]
Scarborough_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[0]}"](area);way(r);out qt;'
North_York_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[1]}"](area);way(r);out qt;'
Old_Toronto_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[2]}"](area);way(r);out qt;'
Etobicoke_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[3]}"](area);way(r);out qt;'
York_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[4]}"](area);way(r);out qt;'
East_Yorkh_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[5]}"](area);way(r);out qt;'
queries = [Scarborough_query, North_York_query, Old_Toronto_query, Etobicoke_query, York_query, East_Yorkh_query]
overpass_response = []
for x in queries:
tmp = OverpassQuery(x).copy()
overpass_response.append(tmp)
ways = []
for y in overpass_response:
ways_inner = []
for x in y:
ways_inner.append(x['nodes'])
ways.append(ways_inner)
for x in ways:
x = SortNode(x)
ways_coordinates = []
node_list = []
bound_lat_max = -360
bound_lat_min = 360
bound_lon_max = -360
bound_lon_min = 360
for bor in ways:
ways_coordinates_bor = []
node_list_inner = []
for w in bor:
ways_coordinates_inner = []
for x in w:
result_ways = api.query(f'node/{x}')
ways_coordinates_inner.append([result_ways.lat(),result_ways.lon()])
node_list_inner.append([result_ways.lat(),result_ways.lon()])
if result_ways.lat() > bound_lat_max:
bound_lat_max = result_ways.lat()
if result_ways.lat() < bound_lat_min:
bound_lat_min = result_ways.lat()
if result_ways.lon() > bound_lon_max:
bound_lon_max = result_ways.lon()
if result_ways.lon() < bound_lon_min:
bound_lon_min = result_ways.lon()
ways_coordinates_bor.append(ways_coordinates_inner)
node_list.append(node_list_inner)
ways_coordinates.append(ways_coordinates_bor)
print(f'bound_lat_min: {bound_lat_min}, bound_lon_min: {bound_lon_min}, bound_lat_max: {bound_lat_max}, bound_lon_max: {bound_lon_max}')
node_list_reverse = []
for x in node_list:
tmp_inner = []
for i in x:
tmp_inner.append([i[1], i[0]])
node_list_reverse.append(tmp_inner)
To dexcribe the doing better it helps to view the boroughs in a map. The map are colored withe bounderies and areas of the boroughs. The bounderies are used to get the points to compare the boroughs in respect to bicycle interesting topics.
latitude = 43.7134408
longitude = -79.541716
map_select = folium.Map(location=[latitude, longitude], zoom_start=10)
color_map = ['red', 'green', 'blue', 'yellow', 'violet', 'black']
for x, col in zip(node_list, color_map):
folium.PolyLine(x, fill = True, fill_color=col, fill_opacity =0.6).add_to(map_select)
print(f'bound_lat_min: {bound_lat_min}, bound_lon_min: {bound_lon_min}, bound_lat_max: {bound_lat_max}, bound_lon_max: {bound_lon_max}')
map_select.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]])
map_select
Boroughs of Toronto:
| Borough | Color |
|---|---|
| Scarborough | Red |
| North York | green |
| Old Toronto | Blue |
| Etobicoke | Yellow |
| York | Violet |
| East York | Black |
To measure the length of possible biking kilometers or miles we extract the bicycle lanes and track positions from OSM and calculate it length in the boroughs. At first let's look at the bicycle tracks of every borough.
node_list_string = []
for n in node_list:
node_tmp = f''
for x in n:
node_tmp = node_tmp + ' ' + str(x[0]) + ' ' + str(x[1])
node_list_string.append(node_tmp)
#print(node_list_string[2])
borough_ways_geo = []
for b in node_list_string:
query = f'(way["cycleway"](poly:"{b}");way["bicycle"="yes"](poly:"{b}");\
way["segregated"](poly:"{b}");way["highway"="cycleway"](poly:"{b}"););\
out geom qt;'
tmp = OverpassQuery(query)
ways_geo = []
for w in tmp:
geo_loc = []
for n in w["geometry"]:
#print(f'n: {n["lat"]}')
geo_loc.append([n["lat"], n["lon"]])
ways_geo.append(geo_loc)
borough_ways_geo.append(ways_geo)
latitude = 43.7134408
longitude = -79.541716
map_select = folium.Map(location=[latitude, longitude], zoom_start=10, tiles='Openstreetmap',)
color_map = ['red', 'green', 'blue', 'yellow', 'violet', 'black']
for b,c,n in zip(borough_ways_geo, color_map, node_list):
for x in b:
folium.PolyLine(x, fill = False, color=c, fill_opacity =0.6).add_to(map_select)
folium.PolyLine(n, color=c, fill = True, fill_color=c, fill_opacity =0.2).add_to(map_select)
print(f'bound_lat_min: {bound_lat_min}, bound_lon_min: {bound_lon_min}, bound_lat_max: {bound_lat_max}, bound_lon_max: {bound_lon_max}')
map_select.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]])
map_select
Here you can see the ways in OSM which marked a for bicycle usage. Every borough is shown in a different color. Now let me show the sum of the length of the ways in the map for each borough.
borough_ways_length = pd.DataFrame([], index=borough, columns=["Number", "Length in km"])
for b, i in zip(node_list_string, borough):
query_length = f'(way["cycleway"](poly:"{b}");way["bicycle"="yes"](poly:"{b}");\
way["segregated"](poly:"{b}");way["highway"="cycleway"](poly:"{b}"););\
make statistics number=count(ways), length=sum(length());out;'
tmp = OverpassQuery(query_length)
borough_ways_length.loc[i] = [tmp[0]["tags"]["number"], float(tmp[0]["tags"]["length"]) / 1000]
borough_ways_length
The next step is to show the length of kilometers as choropleth map with Folium. From white to prurple are shown up the absolut length of kilometers. The borough with the fewiest kilometers is near white and the borough with maximum is shown in purple.
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
"geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
m = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Choropleth(
geo_data=geojson,
name='choropleth',
data=borough_ways_length["Length in km"],
key_on='feature.id',
fill_color='BuPu',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Kilometers of bicycle tracks',
bins = 9
).add_to(m)
folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]])
m
For an alternative I visualize the length in kilometers in relation to the area of every borough. So let me calculate the length divided through the area of each borough.
borough_data = borough_ways_length.transpose()
borough_data.loc["Area"] = [187.7, 176.9, 97.2, 123.9, 23.2, 21.3]
borough_data.loc["Length per square km"] = (borough_data.loc["Length in km"].astype(float) / borough_data.loc["Area"].astype(float)).round(2)
borough_data = borough_data.transpose()
borough_data
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
"geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
#print(geojson)
m = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Choropleth(
geo_data=geojson,
name='choropleth',
data=borough_data["Length per square km"],
key_on='feature.id',
fill_color='BuPu',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Length per square km',
bins = 9
).add_to(m)
folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]])
m
In this kind of presentation the wider boroughs are lower attractive for bicyclist.
To get the amount of parks for each borough I used Foursquare. I created a query for each borough and store the amount of results in a list and added them to the comparing dataframe. The second step was to set the amount in realtion to the area and add a column to the dataframe.
foursquare_query = 'parks'
borough = ["Scarborough", "North York", "Old Toronto", "Etobicoke", "York", "East York"]
foursquare_parks = []
for b in borough:
near = f'{b}, ON, Kanada'
url = f'https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&near={near}&v={VERSION}&query={foursquare_query}&limit={LIMIT}'
results = requests.get(url).json()
foursquare_parks_geo = []
for x in results["response"]["groups"][0]["items"]:
#foursquare_parks_geo.append([x["venue"]["location"]["lat"], x["venue"]["location"]["lng"]])
foursquare_parks_geo.append([x["venue"]["name"]])
foursquare_parks.append(len(foursquare_parks_geo))
borough_data["Parks"] = foursquare_parks
borough_data["Parks per square km"] = (foursquare_parks / borough_data["Area"].astype(float)).round(2)
borough_data
In the next two maps there the amount of parks and the amount of parks in relation to the area are presented. From yellow to green the attractiveness of the borough is better.
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
"geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
#print(geojson)
m = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Choropleth(
geo_data=geojson,
name='choropleth',
data=borough_data["Parks"],
key_on='feature.id',
fill_color='YlGn',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Parks per borough',
bins = 9
).add_to(m)
folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]])
m
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
"geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
#print(geojson)
m = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Choropleth(
geo_data=geojson,
name='choropleth',
data=borough_data["Parks per square km"],
key_on='feature.id',
fill_color='YlGn',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Parks per square km',
bins = 9
).add_to(m)
folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]])
m
The next step is to get bicycle related shops from OSM. We add the resault to the comparing dataframe.
borough_bicycle_shop_counts = []
for b, i in zip(node_list_string, borough):
query_bicycle_shop_counts = f'(node["shop"="bicycle"](poly:"{b}"););\
make statistics nwr_count=count(nwr);out;'
tmp = OverpassQuery(query_bicycle_shop_counts)
#print(tmp)
borough_bicycle_shop_counts.append(int(tmp[0]["tags"]["nwr_count"]))
borough_data["Shops"] = borough_bicycle_shop_counts
borough_data
From OSM I get the bicycle related destinations like picnic places, amazing viewpoints or huts to explore.
borough_bicycle_attractions = []
for b, i in zip(node_list_string, borough):
query_bicycle_attractions = f'(node["tourism"="viewpoint"](poly:"{b}");\
node["tourism"="picnic_point"](poly:"{b}");node["tourism"="wilderness_hut"](poly:"{b}");\
node["tourism"="alpine_hut"](poly:"{b}"););\
make statistics nwr_count=count(nwr);out;'
tmp = OverpassQuery(query_bicycle_attractions)
#print(tmp)
borough_bicycle_attractions.append(int(tmp[0]["tags"]["nwr_count"]))
borough_data["Destinations"] = borough_bicycle_attractions
borough_data
In this section I create three clusters with help of the generated data. The three clusters determine a rating for locations for bicycle fans.
Cluster 1 is a very good location for bicycle fans
Cluster 2 is a good location for bicycle fans
Cluster 0 are not recommended for bicycle fans
# set number of clusters
kclusters = 3
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(borough_data)
#borough_data.insert(0, 'Cluster Labels', kmeans.labels_)
borough_data["Cluster Labels"] = kmeans.labels_
borough_data
The results are strong related to the boroughs of Toronto. This leads to wide location to select a new home. The boroughs are very different in size. So if you select e.g. Scarborough you have more area to explore than Old Toronto. To make the results better it would be a good solution to divide Toronto in segments of equal rectangles or squares and make the same analysis with them.
The purpose of this notebook was to create a help for people who will select a new living location based on bicycle related attributes. I could deliver a table with 3 clusters to select locations to explore for new living places.